AAN+: Generalized Average Attention Network for Accelerating Neural Transformer
نویسندگان
چکیده
Transformer benefits from the high parallelization of attention networks in fast training, but it still suffers slow decoding partially due to linear dependency O(m) decoder self-attention on previous target words at inference. In this paper, we propose a generalized average network (AAN+) aiming speeding up by reducing O(1). We find that learned weights follow some patterns which can be approximated via dynamic structure. Based insight, develop AAN+, extending our previously proposed (Zhang et al., 2018a, AAN) support more general position- and content-based patterns. AAN+ only requires maintain small constant number hidden states during decoding, ensuring its O(1) dependency. apply as drop-in replacement selfattention conduct experiments machine translation (with diverse language pairs), table-to-text generation document summarization. With masking tricks programming, enables decode sentences around 20% faster without largely compromising training speed performance. Our results further reveal importance localness (neighboring words) capability modeling long-range
منابع مشابه
Accelerating Convolutional Neural Network Systems
Convolutional Neural Networks have recently been shown to be highly effective classifiers for image and speech data. Due to the large volume of data required to build useful models, and the complexity of the models themselves, efficiency has become one of the primary concerns. This work shows that frequency domain methods can be utilised to accelerate the performance training, inference, and sl...
متن کاملAccelerating Recurrent Neural Network Training
An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch bucketing by input sequence length and data parallelization on multiple graphical processing units. The baseline training performance without sequence bucket...
متن کاملA synergetic neural network-genetic scheme for optimal transformer construction
In this paper, a combined neural network and an evolutionary programming scheme is proposed to improve the quality of wound core distribution transformers in an industrial environment by exploiting information derived from both the construction and transformer design phase. In particular, the neural network architecture is responsible for predicting transformer iron losses prior to their assemb...
متن کاملAccelerating the Super-Resolution Convolutional Neural Network
As a successful deep model applied in image super-resolution (SR), the Super-Resolution Convolutional Neural Network (SRCNN) [1, 2] has demonstrated superior performance to the previous hand-crafted models either in speed and restoration quality. However, the high computational cost still hinders it from practical usage that demands real-time performance (24 fps). In this paper, we aim at accel...
متن کاملA generalized ABFT technique using a fault tolerant neural network
In this paper we first show that standard BP algorithm cannot yeild to a uniform information distribution over the neural network architecture. A measure of sensitivity is defined to evaluate fault tolerance of neural network and then we show that the sensitivity of a link is closely related to the amount of information passes through it. Based on this assumption, we prove that the distribu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2022
ISSN: ['1076-9757', '1943-5037']
DOI: https://doi.org/10.1613/jair.1.13896